Fictitious Play with Time-Invariant Frequency Update 

for Network Security 

Kien C. Nguyen, Tansu Alpcan, and Tamer Ba§ar 



Abstract — We study two-player security games which can 
be viewed as sequences of nonzero-sum matrix games played 
by an Attacker and a Defender. The evolution of the game 
is based on a stochastic fictitious play process, where players 
do not have access to each other's payoff matrix. Each has to 
observe the other's actions up to present and plays the action 
generated based on the best response to these observations. In a 
regular fictitious play process, each player makes a maximum 
likelihood estimate of her opponent's mixed strategy, which 
results in a time-varying update based on the previous estimate 
and current action. In this paper, we explore an alternative 
scheme for frequency update, whose mean dynamic is instead 
time-invariant. We examine convergence properties of the mean 
dynamic of the fictitious play process with such an update 
scheme, and establish local stability of the equilibrium point 
when both players are restricted to two actions. We also propose 
an adaptive algorithm based on this time-invariant frequency 
update. 

I. Introduction 

Game theory has recently been used as an effective tool 
to model and solve many security problems in computer and 
communication networks. In a noncooperative matrix game 
between an Attacker and a Defender, if the payoff matrices 
are assumed to be known to both players, each player can 
compute the set of Nash equilibria of the game and play 
one of these strategies to maximize her expected gain (or 
minimize its expected loss). However, in practice, the players 
do not necessarily have full knowledge of each other's payoff 
matrix. For repeated games, a mechanism called fictitious 
play (FP) can be used for each player to learn her opponent's 
motivations. In a FP process, each player observes all the 
actions and makes estimates of the mixed strategy of her 
opponent. At each stage, she updates this estimate and plays 
the pure strategy that is the best response (or generated based 
on the best response) to the current estimate of the other's 
mixed strategy. It can be seen that in a FP process, if one 
player plays a fixed strategy (either of the pure or mixed 
type), the other player's sequence of strategies will converge 
to the best response to this fixed strategy. Furthermore, it 
has been shown that, for many classes of games, such a 
FP process will finally render both players playing a Nash 
equilibrium (NE). 
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Specifically, we examine a two-player game, where an 
Attacker (denoted as player 1 or Pi) and a Defender (denoted 
as player 2 or P2) participate in a discrete-time repeated 
nonzero-sum matrix game. In a general setting, the Attacker 
has m possible actions and the Defender has n posssible 
actions to choose from. For example, when m = n = 2, the 
Attacker's actions could be to attack one node in a two-node 
network, and those of the Defender are to defend one of 
these two nodes. Players do not have access to each other's 
payoff function. They adjust their strategies based on each 
other's actions which they observe. 

In a stochastic FP process, each player makes a maximum 
likelihood estimation of her opponent's mixed strategy. As 
will be seen later on, this will result in a time-varying update 
of the opponent's empirical frequency, where the weight of 
the action at time step fc is 1/fc. In a practical repeated 
security game, however, we notice a couple of possible 
complications. First, players may not have the exact and 
synchronized time steps. Second, each player may want to 
adjust the weight of the other's current action to converge 
either faster or more accurately to the equilibrium. A more 
flexible scheme to update the estimate of the mixed strategy 
may be needed in such situations. Motivated by these practi- 
cal considerations, we examine in this paper a time-invariant 
frequency update mechanism for fictitious play. Also, as a 
side note, such a time-invariant update mechanism will allow 
us to use the analysis tools applicable only to time-invariant 
systems. 

Security games have been examined extensively in a large 
number of papers, see for example, [1], [2], [3], [7], [18], 
[14]. Surveys on applications of game theory to network 
security can be found in [6], [17]. Relevant literature on 
fictitious play can be found in [16], [12], [5], [19], [20], 
[11], [13], [15]. A comprehensive exposition of learning in 
games can be found in [8]. 

The rest of this paper is organized as follows. In SectionHD 
we provide an overview of the static game and the standard 
stochastic FP process, and then introduce the stochastic FP 
with time-invariant frequency update. The analysis for FP 
with time-invariant frequency update is given in Section |lll] 
In Section |IV] we introduce an adaptive algorithm based on 
the time-invariant FP process. Next, simulation results are 
given in Section [V] Finally, some concluding remarks will 
end the paper 



II. Fictitious Play with Time-Invariant 
Frequency Update 

In this Section, we present first an overview of a two- 
player static games, then the concept of Stochastic Fictitious 
Play with Time- Varying Frequency Update (TVFU-FP) [19], 
[20], [13], [15], and finally the concept of Stochastic Ficti- 
tious Play with Time-Invariant Frequency Update (TIFU-FP). 
While we introduce both classical version and stochastic ver- 
sion of static games, we restrict ourseves to only stochastic 
fictitious play in Subsections III-BI and Ill-Cl and in the rest of 
the paper. 

A. Static Games 

We consider here static security games, where each player 
Pi, i = 1,2, has two possible actions (or pure strategies). We 
use Vi, to denote the action of Pi. Let A (2) be the simplex 
in 3?2, i.e., 

A(2) = {s e 3?2|si, S2 > and si + S2 = 1} . (1) 

Each Vi takes value in the set of (two) vertices of A(2): 
Vi — [1 0]"^ for the first action, and Vi = [0 1]"^ for the 
second action. In a static game, player Pi selects an action 
Vi according to a mixed strategy pi G A (2). The (instant) 
payoff for player Pi i^ vfMiV-i + TiH{pi), where Mi is 
the payoff matrix of Pi, and H{pi) is the entropy of the 
probability vector pi, H{pi) = —pflog{pi). The weighted 
entropy TiH{pi) with > is introduced to boost mixed 
strategies. In a security game, signifies how much player 
i wants to randomize its actions, and thus is not necessarily 
known to the other player. Also, for ti — T2 — (referred 
to as classical FP), the best response mapping can be set- 
valued, while it has a unique value when > (referred to 
as stochastic FP). For a pair of mixed strategy {pi,p2), the 
utility functions are given by the expected payoffs: 

U,ip,,p-,) = E[vfM,v^,]+T,H{pi) 

= pjM,p^, + nH{p,). (2) 

Now, the best response mappings Pi : A(2) A(2) are 
defined as: 

A(P-i)=arg max Ui{pi,p^i). (3) 
PieA(2) 

If Ti > 0, the best response is unique as mentioned earlier, 
and is given by: 




where the soft-max function cr : — > Interior(A(2)) is 
defined as 

(cr(a;)), — ,j = l,2. (5) 

Note that {a{x))j > 0, and thus the range of the soft-max 
function is just the interior of the simplex. 

' As standard in the game theory literature, the index —i is used to indicate 
those of other players, or the opponent in this case. 



Finally, a (mixed strategy) Nash equilibrium is defined to 
be a pair (pi,p2) e A(2) x A(2) such that for all p^ e A(2) 

U.i{pi,p-i) < Ui{p.i,p-i). (6) 

We can also write a Nash equilibrium {pi,p2) as the fixed 
point of the best response mappings: 

= « = 1,2. (7) 

B. Stochastic Fictitious Play with Time-Varying Frequency 
Update 

From the static game described in Subsection III-AI we 
define the discrete-time TVFU-FP as follows. Suppose that 
the game is repeated at times fc€:{0,l,2,...}. The empirical 
frequency qi (k) of player Pi is given by 

1 

*(fc + i) = r-rE^^W- (8) 

Using induction, we can prove the following recursive rela- 
tion: ^ ^ 

^('^ + 1) = fcTT'^^^^^ ^ ITi'''^''^- ^'^^ 

From the equations of discrete-time TVFU-FP (O, (|9]l, the 
continuous-time version of the iteration can be written down 
as follows [13]: 

p,{t) = Mp-^it))~p,it), i^l,2. (10) 

C. Stochastic Fictitious Play with Time-Invariant Frequency 
Update 

In TVFU-FP, players take the maximum likelihood es- 
timate of the mixed strategy of their opponent (O, (|9]l. 
In TIFU-FP, the estimates of the mixed strategies will be 
calculated in a time-invariant manner as follows: 

nil) = v^{0), (11) 
r,{k + l) = {l-r^)r,{k)+r]v,{k), (12) 

where 77 is a constant and < r/ < 1. For each player, this 
is basically the exponential smoothing formula used in time 
series analysis (See for example [9]). We will prove that 
with this formulation, at time k, ri{k) will be a weighted 
average of all the actions up to present of player i where 
more recent actions have higher weights. Suppose that the 
payoff matrices of player 1 and player 2 are, respectively. 

Assumption 1: Based on a realistic security game, we can 
make the following assumptions: 

• a < c: When the Defender defends, the payoff of the 
Attacker will be decreased if it attacks. 

» b > d: When the Defender does not defend, the payoff 
of the Attacker will be increased if it attacks. 

• e > /: When the Attacker attacks, the payoff of the 
Defender will be decreased if it does not defend. 

» g < h: When the Attacker does not attack, the payoff 
of the Defender will be increased if it does not defend. 



1: Given payoff matrix Mi, coefficient > 0, i = 1, 2. 
2: for k e {0,1,2,...} do 

3: Update the estimated frequency of the opponent 

using (dU, ( fT2] l. 
4: Compute the best response using (01). (Note that the 

result is always a completely mixed strategy.) 
5: Randomly play an action Vi{k) according to the best 

response mixed strategy /3i(r-i{k)). 
6: end for 

Algorithm 1: Fictitious Play with Time-Invariant Fre- 
quency Update. 



In TIFU-FP, both players employ Algorithm [T] The mean 
dynamic of the evolution of TIFU-FP can be written as: 

r,(fc + l) = (l-77)r,(fc)+77/3,(r_,(fc)), z = l,2. (14) 

Note that Equations (fT4l i are just evolution of the estimated 
frequencies; the empirical frequencies still evolve in a time- 
varying manner: 



qiik + 1) 



-<lt{k) 



1 



-w,(fc), i = 1,2. (15) 



fc + 1^ ^ ' /c + r 

The mean dynamic of empirical frequencies then can be 
written as 



qi{k+l) 



-qi{k) 



1 



-/3,(r_,(fc)), 1 = 1,2. (16) 



fc+ 1^^ ' k + V 

III. Analysis 
A. Nash Equilibrium of the Static Game 

We start the analysis with the following result for the static 
games given in Subsection III-AI 

Proposition 1: The static 2-player 2-action game in III-AI 
with Assumption [T] and ti,T2 > admits a unique Nash 
equilibrium. 



Best responses vs. , a=1 , b=5. c=3, d=2, e=4, f=3, g=1 , h=5, t^=0.5, 1^=0.3 




Fig. 1. Static 2-player 2-action game in III-AI witli Assumption [Tj and 
Ti,T2 > - Best response mappings. 



Proof: In what follows, let ri 



{rlr^r, /3i(r2) ^ (PHr^), f3f{r2)r , and /S^ir^) ^ 
(/32 (''2), Z?! (^2))"^- We first use the Brouwer fixed point 
theorem (see for example [4]) to prove the existence of 
a Nash equilibrium, then use monotonicity of /^^(rj) and 
/^2i^i) to prove that the fixed point is unique. Here we write 



f^ii^l) a scalar-valued function of the only independent 
variable using the fact that T2 G A(2), or + = 1. 
Function /?2(?'i) is defined similarly. Also, as ri,r2 G A(2), 
a pair {rl , ) completely specifies the estimated frequencies 
of the players. As seen from Equation (|7]), a Nash equilibrium 
{ri,r2) is a fixed point of the best response mapping: 

r^ = i = 1,2. 

It suffices to write this mapping as r ~ (3{r), where r = 
(r|[,r2)^. Specifically, the mapping f3 can be detailed as: 



^{^lari+b{l-ri)]] ^ ^{^lcri+d(l-r},)]} ' 

It can be seen that /?i(''2) ^ (0) !)■ Similarly, we have 

rl = f3lirl)e (0,1). 

Thus /3 is a transformation from [0,1]'^ to [0,1]^, which 
is a compact convex set. As both mappings f3l and /3| 
that constitute (3 are continuous, f3 is also a continuous 
transformation. Using the Brouwer fixed point theorem, there 
exists at least one fixed point r such that f = /3(f), which is 
a Nash equilibrium of the static game. Now we examine the 
derivatives of (3l{rl) and P^irl) with respect to their own 
independent variables: 

= ^[ia^c) + id-b)]l3lir2)l3l{r2), 



Tl 

= -[(e-f) 

T2 



{h-g)]/3f{n)(3j{n) 



dr\ 

From Assumption [T] (a — c) + (d — fo) < and (e — 
f) + {h — g) > 0. Thus PKr^) is strictly decreasing in 
r^, and P^irl) is strictly increasing in r\. Now suppose that 
there exist two distinct Nash equilibria, {r[, rj) and (r", r!^). 
Obviously, r[ ^ r", otherwise we will have r'2 — r!^, and 
these two points coincide. Without loss of generality, assume 
that r[ < r'/. As Pli^i) is strictly increasing in r}, we have 
that < rj. However, Pl{rl) is strictly decreasing in r^, 
so r[ > r", which is contradictory to the initial assumption. 
Thus the Nash equilibrium is unique. ■ 
We illustrate in Figure [T] the curves f3l{rl) and /3|(r|t) with 
the values of Mi, M2, ti, and T2 as shown. The intersection 
of these two curves is the Nash equilibrium of the static 
game. 

B. Estimated Frequencies and Empirical Frequencies 

We present here two propositions for TIFU-FP: The first 
shows the weights of each player's actions in the estimated 
frequency, and the second shows the relationship between 
estimated frequencies and empirical frequencies. 

Proposition 2: For k > 2, the estimated frequencies in 
TIFU-FP constructed using ( fTTT i. ( fT2] i will satisfy 

r,(fc) - (l-77)'=-iz;,(0) + (l-77f-2r;i,,;(l) 

+(1 - r])''-^rjv^{2) + . . . + (1 - r])r]v^{k - 2) 
+VV^{k-l), (17) 



where i = 1,2. 

Proof: This result can be proved using induction. ■ 
Proposition 3: In TIFU-FP, the empirical frequencies are 
related to the estimated frequencies calculated using (fTTT i. 
(fT2] i through the following equation: 



ft(fc + 1) = 



1 



k+1 



n{k + i) 



, z = l,2. 



(18) 



Proof: This result can be proved by writing the actions 
of player Pi at times 0, 1, . . . , fc in terms of the estimated 
frequencies at times 1, 2, . . . , (fc + 1). ■ 

C. Convergence Properties of the Mean Dynamic in TIFU- 
FP 

Theorem 1: Consider a TIFU-FP with Assumption [T| and 
Ti,T2 > 0. The mean dynamic given in Equations (fT4l l is 
asymptotically stable if and only if 

2 



77 < 



[(c-a) + (b-d)][(e-/) + (fe-g)] 1 2^2j;2 , i ' 
r^To 1 1 1 2 I" 



(19) 



Proof: As can be seen in Equations ( fT4b . this is a 
deterministic nonlinear discrete-time time-invariant system. 
We linearize the system at the fixed point and examine 
stability properties of the linearized system using techniques 
described in standard textbooks for nonlinear systems (e.g., 
[10]). Using the mean dynamic (fT4] i. where 



ri(fc) - 



r\{k) 
rl{k) 



,r2{k) = 



i(fc) 



(20) 



it can be seen that a pair (fi,f2) that satisfies fi = 
i = 1,2, is a fixed point of the system. Consider 
the Jacobian matrix 



dF{r) _ 

— " ~ I dF2{r) dF2{r) 



dr 



We have that 



9Fi(r) dF2{r) ^ 

7^ = ^~'n, 



dr\ 
dFijr) 
drl 



V- 



dPl{r2) 



dFijr) 



-L[[a-c) + {d~h)]Pl{r2)Pl{r2). 



At the fixed point (ri,r2), we can write 



1 = — (a - c) + (d - h)\r-,r 
Ti 



VI- 



r.,(l) + rj(2) + . . . + r,{k) Similarly, 



dr\ 



Recall that 



where 



Thus 



Mir2 _ ( \[arl+b{\~rl)] 



Then 



= l[(a - c) + (d - h)]Pl{r2)Pl{r2), 



Using the conditions for local stability, \pi,2\ < 1, where 
pi,2 are eigenvalues of the Jacobian matrix, we finally have 
the condition in Equation (fT9] l. ■ 
Remark 1: Although this theorem only mentions the 
asymptotic stability of the estimated frequencies (of the mean 
dynamic), once these estimated frequencies converge to the 
Nash equilibrium, the best responses will also converge to 
the Nash equilibrium, and so will the empirical frequencies 
in the long run. 

IV. Adaptive Fictitious Play 

In this section we examine an adaptive FP algorithm 
(hereafter referred to as AFP) based on FP with Time- 
Invariant Frequency Update, where the step size 77 is piece- 
wise constant and decreased over time. For the specific 
implementation shown in Algorithm |2] the step size is either 
kept fixed or halved, based on the variance of empirical 
frequency in the previous time window. 

V. Simulation results 

We present in this section some simulation results for 
TIFU-FP and AFP where the payoff matrices and entropy 
coefficients are chosen to be 



Ah 



1 5 
3 2 



M2 



4 1 
3 5 



, ri = 0.5, T2 ^ 0.3. 



The Nash Equilibrium of the static game is (0.79, 0.21) 
and (0.47, 0.53). The local stabihty threshold (the RHS 
of Equation ( fT9l l) is 7/0 — 0.2536. For simplicity, in the 
graphs shown here, we only plot the first component of each 
frequency vector 

A. Fictitious Play with Time Invariant Frequency Update 

Some simulation results for the mean dynamic of TIFU- 
FP (Equations (O) are given in Figures |2] and [3] When 
77 = 0.25 < 770 = 0.2536, the estimated frequencies 
are shown in Figure |2] The simulation results show that 
both estimated frequencies and empirical frequencies (not 
presented here due to space limitations) converge to the NE 
as expected. When rj = 0.26 > 770, however, the estimated 
frequencies do not converge anymore. These simulations thus 
confirm the theoretical result in Theorem [T] It is also worth 
noting that the empirical frequencies in the case 77 = 0.26 
still converge to the NE. Unlike the mean dynamic, a 
stochastic TIFU-FP process (generated with Algorithm [T]) 
exhibits significant random fluctuations. The graph in Figure 
|4] shows the estimated frequencies of such a process where 
we choose 77 = 0.01. However, the empirical frequencies 



Mean TIFU-FP - Estimated Frequencies, ti=0. 26, t =0.5, t =0.3 



1: Given payoff matrix Mi, coefficient r^, i ~ 1,2, initial 
step size rj^, minimum step size rjmin, and window size 
T. 

2: for k e {0,1,2,...} do 

3: Update the estimated frequency of the opponent, 

r_j, using ( fTTT) . ( fT2b . 
4: Compute the best response mixed strategy 

/3.^(r„i(fc)) using dUl. 
5: Randomly play an action ai(k) according to the best 

response mixed strategy /3i{r-.i{k)), such that the 

expectation E [ai{k)] = /3i{r^i{k)). 
6: if at the end of a time window, mod (fc, T) = 0, 

then 



8: 

9: 

10 
11 

12 
13 
14 




Compute the standard deviation of the estimated Fig- 3- Mean dynamic of FP with Time-lnvaiiant Frequency Update 
J, • / , 1 r\ • 1 • -1 Estimated Frequencies, v = 0.26, vn = 0.2536. 

frequencies (stdei) in the time window 



[r_,(fc-r+l), 
estimator): 



, r_.i(fc)] (using an unbiased 




E 



ft,=fc-T+l 



'Eh=k-T+i {r-i{h) - mef(k)) 



Stocliastic TIFU-FP - Estimated Frequencies, r|=0.01 , t =0.5, i =0.3 



(T-l) 



if the computed stdef (k) has decreased compared 
to previous time window then 

Decrease step size: -q — 0.5 ij and 

77 = max(77,?7™i„). 
else 

Keep step size rj constant, 
end if 
end if 
end for 

Algorithm 2: Adaptive Fictitious Play 
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Fig. 4. Stochastic FP with Time-Invariant Frequency Update - Estimated 
Frequencies, r/ = 0.01. 



(whose graph is not shown here due to space limitations) 
still converge to the NE . 



Mean TIFU-FP - Estimated Frequencies, ti=0. 25, t =0.5, =0.3 




Fig. 2. Mean dynamic of FP with Time-Invariant Frequency Update 
Estimated Frequencies, rj = 0.25, t]o = 0.2536. 



B. Adaptive Fictitious Play 

Some simulation results for adaptive FP are shown in 
Figures |6]and|2] The payoff matrices and entropy coefficients 
are the same as those in lV-AI Initial and minimum step sizes 
are chosen to be 770 = 0.1 and ri„iin = 0.0005, respectively. 
The time window for updating the step size is T = 50 steps. 
The evolution of the empirical frequencies are depicted in 
Figure |6] which shows that adaptive FP converges faster 
than the stochastic FP with time-varying frequency update 
(TVFU-FP) (Figure|5l). We however remark that it is possible 
to incorporate a decreasing coefficient into the step size in 
TVFU-FP (which is originally l/k) to make the TVFU-FP 
process converge faster [11]. The update of the step size in 
adaptive FP is shown in Figure |7] Note that when compared 
to the step size 1/A; in TVFU-FP, the step sizes in adaptive FP 
are higher in the beginning and smaller afterwards, resulting 
in aggressive convergence first and less fluctuation in the 
stable phase. 



VI. Conclusions 



Stochastic TVFU-FP - Empirical Frequencies, t =0.5, t =0.3 
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Fig. 5. Stochastic FP with Time- Valuing Frequency Update - Empirical 
Frequencies. 



Adaptive FP - Empirical Frequencies 
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Fig. 6. Adaptive Stochastic FP - Empirical Frequencies. 



Evolution of step sizer] versus 1/k 
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Fig. 7. Adaptive Stochastic FP - Evolution of step size. 



In this paper, we have introduced a time-invariant scheme 
to estimate the frequency of the opponent's actions in a two- 
player two-action fictitious play process. We have proved 
local stability of the unique Nash equilibrium for the mean 
version of this FP dynamic. This frequency update scheme, 
when used adaptively, allows players to converge faster to 
the Nash equilibrium. For this two-player two-action FP, 
conditions for global stability, if they exist, are yet to be 
found. Also, having more than two possible actions for each 
player is an intriguing research extension. 
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